Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero
نویسنده
چکیده
We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationally important case. In particular, we show that a certain ε-approximate solution can be obtained and establish the linear dependence of ε on the stepsize limit. Incremental gradient methods are particularly well-suited for large neural network training problems where obtaining an approximate solution is typically sufficient and is often preferable to computing an exact solution. Thus, in the context of neural networks, the approach presented here is related to the principle of tolerant training. Our results justify numerous stepsize rules that were derived on the basis of extensive numerical experimentation but for which no theoretical analysis was previously available. In addition, convergence to (exact) stationary points is established when the gradient satisfies a certain growth property.
منابع مشابه
An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule
We consider an incremental gradient method with momentum term for minimizing the sum of continuously differentiable functions. This method uses a new adaptive stepsize rule that decreases the stepsize whenever sufficient progress is not made. We show that if the gradients of the functions are bounded and Lipschitz continuous over a certain level set, then every cluster point of the iterates gen...
متن کاملAnalysis of gradient descent methods with non-diminishing, bounded errors
Implementations of stochastic gradient search algorithms such as back propagation typically rely on finite difference (FD) approximation methods. These methods are used to approximate the objective function gradient in steepest descent algorithms as well as the gradient and Hessian inverse in Newton based schemes. The convergence analyses of such schemes critically require that perturbation par...
متن کاملOn Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning
We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several gradient-based TD algorithms with linear function approximation. The algorithms we analyze include: (i) two basic forms of two-time-scale gradient-based TD algorithms,...
متن کاملGlobal convergence of the method of shortest residuals
The method of shortest residuals (SR) was presented by Hestenes and studied by Pitlak. If the function is quadratic, and if the line search is exact, then the SR method reduces to the linear conjugate gradient method. In this paper, we put forward the formulation of the SR method when the line search is inexact. We prove that, if stepsizes satisfy the strong Wolfe conditions, both the Fletcher-...
متن کاملA Note on the Gradient Projection Method with Exact Stepsize Rule *1)
In this paper, we give some convergence results on the gradient projection method with exact stepsize rule for solving the minimization problem with convex constraints. Especially, we show that if the objective function is convex and its gradient is Lipschitz continuous, then the whole sequence of iterations produced by this method with bounded exact stepsizes converges to a solution of the con...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comp. Opt. and Appl.
دوره 11 شماره
صفحات -
تاریخ انتشار 1998